Cobain GIC

Overview:

This page contains the results of CoNGA analyses. Results in tables may have been filtered to reduce redundancy, focus on the most important columns, and limit length; full tables should exist as OUTFILE_PREFIX*.tsv files.

Command:

scripts/run_conga.py --all --gex_data /scratch.global/ben_testing/ben_tcr/Rotelle_GIC/outs/filtered_feature_bc_matrix.h5 --gex_data_type 10x_h5 --clones_file Rotelle_GIC_TCR --organism human --outfile_prefix Rotelle_GIC_Final2

Stats

num_cells_w_gex: 14913
num_features_start: 26530
num_cells_w_tcr: 733
min_genes_per_cell: 200
max_genes_per_cell: 3500
max_percent_mito: 0.1
num_filt_max_genes_per_cell: 91
num_filt_max_percent_mito: 0
num_antibody_features: 0
num_TR_genes: 43
num_TR_genes_in_hvg_set: 42
num_highly_variable_genes: 1670
num_cells_after_filtering: 642
num_clonotypes: 532
max_clonotype_size: 16
num_singleton_clonotypes: 467
nbr_frac_for_nndists: 0.1
num_gvg_hit_clonotypes: 11
num_gvg_hit_biclusters: 0

graph_vs_graph_stats


Here we are assessing overall graph-vs-graph correlation by looking at the shared edges between TCR and GEX neighbor graphs and comparing that observed number to the number we would expect if the graphs were completely uncorrelated. Our null model for uncorrelated graphs is to take the vertices of one graph and randomly renumber them (permute their labels). We compare the observed overlap to that expected under this null model by computing a Z-score, either by permuting one of the graph's vertices many times to get a mean and standard deviation of the overlap distribution, or, for large graphs where this is time consuming, by using a regression model for the standard deviation. The different rows of this table correspond to the different graph-graph comparisons that we make in the conga graph-vs-graph analysis: we compare K-nearest-neighbor graphs for GEX and TCR at different K values ("nbr_frac" aka neighbor-fraction, which reports K as a fraction of the total number of clonotypes) to each other and to GEX and TCR "cluster" graphs in which each clonotype is connected to all the other clonotypes with the same (GEX or TCR) cluster assignment. For two K values (the default), this gives 2*3=6 comparisons: GEX KNN graph vs TCR KNN graph, GEX cluster graph vs TCR KNN graph, and GEX KNN graph vs TCR cluster graph, for each of the two K values (aka nbr_fracs).

The column to look at is *overlap_zscore*. Higher values indicate more significant GEX/TCR covariation, with "interesting" levels starting around zscores of 3-5.

Columns in more detail:

graph_overlap_type: KNN ("nbr") or cluster versus KNN ("nbr") or cluster

nbr_frac: the K value for the KNN graph, as a fraction of total clonotypes

overlap: the observed overlap (number of shared edges) between GEX and TCR graphs

expected_overlap: the expected overlap under a shuffled null model.

overlap_zscore: a Z-score for the observed overlap computed by subtracting the expected overlap and dividing by the standard deviation estimated from shuffling.
overlap expected_overlap overlap_mean overlap_sdev overlap_zscore overlap_zscore_fitted overlap_zscore_source nodes calculation_time calculation_time_fitted gex_edges tcr_edges gex_indegree_variance gex_indegree_skewness gex_indegree_kurtosis tcr_indegree_variance tcr_indegree_skewness tcr_indegree_kurtosis indegree_correlation_R indegree_correlation_P nbr_frac graph_overlap_type
41 25.047081 25.71 6.255070 2.444417 4.397082 shuffling 532 0.059355 0.001543 2660 2660 1.414689 2.431343 9.051368 0.383277 0.932702 0.989053 0.066905 0.123253 0.01 gex_nbr_vs_tcr_nbr
289 261.920904 259.89 17.754940 1.639544 1.748683 shuffling 532 0.216329 0.017912 2660 27816 1.414689 2.431343 9.051368 0.194624 0.616989 -0.808238 0.029297 0.500129 0.01 gex_nbr_vs_tcr_cluster
470 402.749529 403.14 22.576988 2.961422 5.014587 shuffling 532 0.317310 0.028074 42772 2660 0.125965 -0.635407 -0.830691 0.383277 0.932702 0.989053 0.036501 0.400791 0.01 gex_cluster_vs_tcr_nbr
3043 2814.290019 2809.03 85.511924 2.736110 2.820836 shuffling 532 0.251573 0.174766 28196 28196 1.126519 1.580659 2.593195 0.258051 1.777979 6.075861 0.064742 0.135874 0.10 gex_nbr_vs_tcr_nbr
2846 2776.361582 2786.90 83.231545 0.710067 0.931263 shuffling 532 0.256546 0.172307 28196 27816 1.126519 1.580659 2.593195 0.194624 0.616989 -0.808238 0.036406 0.402021 0.10 gex_nbr_vs_tcr_cluster
4440 4269.145009 4268.83 77.180186 2.217797 2.752895 shuffling 532 0.446019 0.270063 42772 28196 0.125965 -0.635407 -0.830691 0.258051 1.777979 6.075861 -0.033397 0.442066 0.10 gex_cluster_vs_tcr_nbr

graph_vs_graph


Graph vs graph analysis looks for correlation between GEX and TCR space by finding statistically significant overlap between two similarity graphs, one defined by GEX similarity and one by TCR sequence similarity.

Overlap is defined one node (clonotype) at a time by looking for overlap between that node's neighbors in the GEX graph and its neighbors in the TCR graph. The null model is that the two neighbor sets are chosen independently at random.

CoNGA looks at two kinds of graphs: K nearest neighbor (KNN) graphs, where K = neighborhood size is specified as a fraction of the number of clonotypes (defaults for K are 0.01 and 0.1), and cluster graphs, where each clonotype is connected to all the other clonotypes in the same (GEX or TCR) cluster. Overlaps are computed 3 ways (GEX KNN vs TCR KNN, GEX KNN vs TCR cluster, and GEX cluster vs TCR KNN), for each of the K values (called nbr_fracs short for neighbor fractions).

Columns (depend slightly on whether hit is KNN v KNN or KNN v cluster): conga_score = P value for GEX/TCR overlap * number of clonotypes mait_fraction = fraction of the overlap made up of 'invariant' T cells num_neighbors* = size of neighborhood (K) cluster_size = size of cluster (for KNN v cluster graph overlaps) clone_index = 0-index of clonotype in adata object


conga_score num_neighbors_gex num_neighbors_tcr overlap overlap_corrected mait_fraction clone_index nbr_frac graph_overlap_type cluster_size gex_cluster tcr_cluster va ja cdr3a vb jb cdr3b
0.120163 53.0 53.0 14 14 0.0 112 0.10 gex_nbr_vs_tcr_nbr NaN 1 0 TRAV13-2*01 TRAJ45*01 CAETGGSANRLTF TRBV27*01 TRBJ2-7*01 CASSFARTQYEQYF
0.134907 NaN 5.0 5 5 0.0 422 0.01 gex_cluster_vs_tcr_nbr 103.0 1 7 TRAV6*01 TRAJ37*01 CAPVSNTGKLIF TRBV7-2*01 TRBJ1-4*01 CASSLPGGLGEKLFF
0.373793 5.0 5.0 2 2 0.0 34 0.01 gex_nbr_vs_tcr_nbr NaN 7 0 TRAV12-1*01 TRAJ33*01 CAVRLDSNYQLIW TRBV3-2*01 TRBJ2-1*01 CASRAAGGSNEQFF
0.373793 5.0 5.0 2 2 0.0 225 0.01 gex_nbr_vs_tcr_nbr NaN 5 5 TRAV20*01 TRAJ11*01 CAVPRDGTLTF TRBV5-7*01 TRBJ2-4*01 CASSFWTGEQNTQYF
0.378032 5.0 5.0 2 2 1.0 14 0.01 gex_nbr_vs_tcr_nbr NaN 4 9 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV10-1*01 TRBJ2-6*01 CASSEEGGSGASVLTF
0.378032 5.0 5.0 2 2 1.0 16 0.01 gex_nbr_vs_tcr_nbr NaN 4 9 TRAV1-2*01 TRAJ33*01 CAVRDSNYQLIW TRBV4-3*01 TRBJ2-6*01 CASSQEGGGASVLTF
0.417430 NaN 53.0 14 14 0.0 35 0.10 gex_cluster_vs_tcr_nbr 59.0 4 10 TRAV12-1*01 TRAJ34*01 CAVRPRHNANKLIF TRBV7-2*01 TRBJ2-2*01 CASSQQRLGGLANTAQLFF
0.556993 NaN 53.0 9 9 0.0 434 0.10 gex_cluster_vs_tcr_nbr 29.0 6 1 TRAV8-2*01 TRAJ27*01 CAVNDMRADKLTF TRBV13*01 TRBJ2-5*01 CASSSSAEKGTQYF
0.696650 53.0 NaN 7 7 0.0 271 0.10 gex_nbr_vs_tcr_cluster 19.0 3 11 TRAV21*01 TRAJ47*01 CAVRPDYGNKLIF TRBV13*01 TRBJ2-3*01 CASSPLPATDPQYF
0.696650 53.0 NaN 7 7 0.0 283 0.10 gex_nbr_vs_tcr_cluster 19.0 3 11 TRAV24*01 TRAJ9*01 CAFGTGGFKTVF TRBV15*01 TRBJ1-5*01 CASSKQTGGSNQPQYF
0.726670 NaN 5.0 3 3 0.0 434 0.01 gex_cluster_vs_tcr_nbr 29.0 6 1 TRAV8-2*01 TRAJ27*01 CAVNDMRADKLTF TRBV13*01 TRBJ2-5*01 CASSSSAEKGTQYF
0.802195 NaN 53.0 16 16 0.0 425 0.10 gex_cluster_vs_tcr_nbr 77.0 3 7 TRAV6*01 TRAJ41*01 CAAEDSGYALNF TRBV5-8*01 TRBJ2-1*01 CASSWIGTGGSTEQFF

tcr_clumping


This table stores the results of the TCR "clumping" analysis, which looks for neighborhoods in TCR space with more TCRs than expected by chance under a simple null model of VDJ rearrangement.

For each TCR in the dataset, we count how many TCRs are within a set of fixed TCRdist radii (defaults: 24,48,72,96), and compare that number to the expected number given the size of the dataset using the poisson model. Inspired by the ALICE and TCRnet methods.

Columns: clump_type='global' unless we are optionally looking for TCR clumps within the individual GEX clusters num_nbrs = neighborhood size (number of other TCRs with TCRdist

clump_type clone_index nbr_radius pvalue_adj num_nbrs expected_num_nbrs raw_count va ja cdr3a vb jb cdr3b clonotype_fdr_value clumping_group clusters_gex clusters_tcr
global 87 48 0.000119 2 0.000334 1572.0 TRAV13-1*01 TRAJ36*01 CAAKDEVNNLFF TRBV10-1*01 TRBJ2-7*01 CASKDGYEQYF 0.000118 1 3 0
global 89 48 0.000328 2 0.000555 2615.0 TRAV13-1*01 TRAJ36*01 CAAKDGVNNLFF TRBV10-1*01 TRBJ2-7*01 CASTEAYEQYF 0.000118 1 2 0
global 88 48 0.000355 2 0.000578 2721.0 TRAV13-1*01 TRAJ36*01 CAAKDGVNNLFF TRBV10-1*01 TRBJ2-7*01 CASTLGYEQYF 0.000118 1 2 0
global 183 48 0.003560 2 0.001830 8617.0 TRAV19*01 TRAJ39*01 CALNERNNAGNVLTF TRBV6-3*01 TRBJ2-3*01 CASSYSRGLSDPQYF 0.000890 2 2 2
global 239 96 0.011321 3 0.031976 150545.0 TRAV20*01 TRAJ43*01 CAHNDIRF TRBV9*01 TRBJ2-1*01 CASSLWGELNEQFF 0.002127 3 2 5
global 515 24 0.013108 1 0.000006 29.0 TRAV9-2*01 TRAJ36*01 CALTQTGVNNLFF TRBV20-1*01 TRBJ1-2*01 CSARDPRRTDYTF 0.002127 6 4 3
global 87 72 0.014886 2 0.003745 17632.0 TRAV13-1*01 TRAJ36*01 CAAKDEVNNLFF TRBV10-1*01 TRBJ2-7*01 CASKDGYEQYF 0.000118 1 3 0
global 89 72 0.025253 2 0.004880 22974.0 TRAV13-1*01 TRAJ36*01 CAAKDGVNNLFF TRBV10-1*01 TRBJ2-7*01 CASTEAYEQYF 0.000118 1 2 0
global 88 72 0.028254 2 0.005162 24303.0 TRAV13-1*01 TRAJ36*01 CAAKDGVNNLFF TRBV10-1*01 TRBJ2-7*01 CASTLGYEQYF 0.000118 1 2 0
global 514 24 0.036611 1 0.000017 81.0 TRAV9-2*01 TRAJ36*01 CALSQTGVNNLFF TRBV20-1*01 TRBJ1-2*01 CSARDPRATDYTF 0.003661 6 4 3
global 331 96 0.043454 2 0.006404 30152.0 TRAV38-1*01 TRAJ27*01 CAFINTNADKLTF TRBV13*01 TRBJ2-4*01 CASSLLVGGRENTQYF 0.003950 7 0 11
global 264 24 0.068249 1 0.000032 151.0 TRAV21*01 TRAJ31*01 CAVRNNNDRVIF TRBV7-6*01 TRBJ1-1*01 CASSFSRNTEAFF 0.005250 4 1 10
global 263 24 0.068249 1 0.000032 151.0 TRAV21*01 TRAJ31*01 CAARNNNDRVIF TRBV7-6*01 TRBJ1-1*01 CASSFSRNTEAFF 0.005250 4 2 10
global 403 72 0.086964 2 0.009068 42693.0 TRAV6*01 TRAJ23*01 CALAYNQAGKLIF TRBV6-3*01 TRBJ2-5*01 CASSSLEETQYF 0.006212 5 2 7
global 89 24 0.112542 1 0.000053 249.0 TRAV13-1*01 TRAJ36*01 CAAKDGVNNLFF TRBV10-1*01 TRBJ2-7*01 CASTEAYEQYF 0.000118 1 2 0
global 88 24 0.120677 1 0.000057 267.0 TRAV13-1*01 TRAJ36*01 CAAKDGVNNLFF TRBV10-1*01 TRBJ2-7*01 CASTLGYEQYF 0.000118 1 2 0
global 240 96 0.147130 2 0.011806 55582.0 TRAV20*01 TRAJ43*01 CAVPVRF TRBV9*01 TRBJ1-5*01 CASSPWGEDQPQYF 0.008260 3 3 5
global 183 96 0.148889 3 0.076316 359305.0 TRAV19*01 TRAJ39*01 CALNERNNAGNVLTF TRBV6-3*01 TRBJ2-3*01 CASSYSRGLSDPQYF 0.000890 2 2 2
global 179 96 0.156943 2 0.012195 57413.0 TRAV19*01 TRAJ39*01 CALNERNNAGNVLTF TRBV6-8*01 TRBJ2-1*01 CGSSYSRTGANNEQFF 0.008260 2 2 2
global 515 48 0.217395 1 0.000102 481.0 TRAV9-2*01 TRAJ36*01 CALTQTGVNNLFF TRBV20-1*01 TRBJ1-2*01 CSARDPRRTDYTF 0.002127 6 4 3
global 242 96 0.224534 2 0.014598 68727.0 TRAV20*01 TRAJ43*01 CAVQDIRF TRBV9*01 TRBJ1-1*01 CASSLWGENTEAFF 0.010692 3 2 5
global 183 72 0.235749 2 0.014960 70431.0 TRAV19*01 TRAJ39*01 CALNERNNAGNVLTF TRBV6-3*01 TRBJ2-3*01 CASSYSRGLSDPQYF 0.000890 2 2 2
global 391 96 0.352987 2 0.018326 86279.0 TRAV5*01 TRAJ32*01 CAENYGGSGNKLIF TRBV13*01 TRBJ2-3*01 CASSWLLGGTDPQYF 0.015347 8 3 11
global 183 24 0.397260 1 0.000187 879.0 TRAV19*01 TRAJ39*01 CALNERNNAGNVLTF TRBV6-3*01 TRBJ2-3*01 CASSYSRGLSDPQYF 0.000890 2 2 2
global 242 48 0.418047 1 0.000196 925.0 TRAV20*01 TRAJ43*01 CAVQDIRF TRBV9*01 TRBJ1-1*01 CASSLWGENTEAFF 0.010692 3 2 5
global 405 48 0.458266 1 0.000215 1014.0 TRAV6*01 TRAJ23*01 CALSYNQAGKLIF TRBV6-3*01 TRBJ2-5*01 CASGNLQETQYF 0.017376 5 2 7
global 182 24 0.469145 1 0.000220 1042.0 TRAV19*01 TRAJ39*01 CALNERNNAGNVLTF TRBV6-3*01 TRBJ2-3*01 CASSYSRGLTDPQYF 0.017376 2 2 2
global 514 48 0.504358 1 0.000237 1116.0 TRAV9-2*01 TRAJ36*01 CALSQTGVNNLFF TRBV20-1*01 TRBJ1-2*01 CSARDPRATDYTF 0.003661 6 4 3
global 263 48 0.616873 1 0.000290 1365.0 TRAV21*01 TRAJ31*01 CAARNNNDRVIF TRBV7-6*01 TRBJ1-1*01 CASSFSRNTEAFF 0.005250 4 2 10
global 264 48 0.616873 1 0.000290 1365.0 TRAV21*01 TRAJ31*01 CAVRNNNDRVIF TRBV7-6*01 TRBJ1-1*01 CASSFSRNTEAFF 0.005250 4 1 10
global 87 96 0.653167 2 0.024984 117625.0 TRAV13-1*01 TRAJ36*01 CAAKDEVNNLFF TRBV10-1*01 TRBJ2-7*01 CASKDGYEQYF 0.000118 1 3 0
global 405 96 0.793866 2 0.027567 129788.0 TRAV6*01 TRAJ23*01 CALSYNQAGKLIF TRBV6-3*01 TRBJ2-5*01 CASGNLQETQYF 0.017376 5 2 7

tcr_clumping_logos


This figure summarizes the results of a CoNGA analysis that produces scores (TCR clumping) and clusters. At the top are six 2D UMAP projections of clonotypes in the dataset based on GEX similarity (top left three panels) and TCR similarity (top right three panels), colored from left to right by GEX cluster assignment; TCR clumping score; joint GEX:TCR cluster assignment for clonotypes with significant TCR clumping scores, using a bicolored disk whose left half indicates GEX cluster and whose right half indicates TCR cluster; TCR cluster; TCR clumping; GEX:TCR cluster assignments for TCR clumping hits, as in the third panel.

Below are two rows of GEX landscape plots colored by (first row, left) expression of selected marker genes, (second row, left) Z-score normalized and GEX-neighborhood averaged expression of the same marker genes, and (both rows, right) TCR sequence features (see CoNGA manuscript Table S3 for TCR feature descriptions).

GEX and TCR sequence features of TCR clumping hits in clusters with 3 or more hits are summarized by a series of logo-style visualizations, from left to right: differentially expressed genes (DEGs); TCR sequence logos showing the V and J gene usage and CDR3 sequences for the TCR alpha and beta chains; biased TCR sequence scores, with red indicating elevated scores and blue indicating decreased scores relative to the rest of the dataset (see CoNGA manuscript Table S3 for score definitions); GEX 'logos' for each cluster consisting of a panel of marker genes shown with red disks colored by mean expression and sized according to the fraction of cells expressing the gene (gene names are given above).

DEG and TCRseq sequence logos are scaled by the adjusted P value of the associations, with full logo height requiring a top adjusted P value below 10-6. DEGs with fold-change less than 2 are shown in gray. Each cluster is indicated by a bicolored disk colored according to GEX cluster (left half) and TCR cluster (right half). The two numbers above each disk show the number of hits within the cluster (on the left) and the total number of cells in those clonotypes (on the right). The dendrogram at the left shows similarity relationships among the clusters based on connections in the GEX and TCR neighbor graphs.

The choice of which marker genes to use for the GEX umap panels and for the cluster GEX logos can be configured using run_conga.py command line flags or arguments to the conga.plotting.make_logo_plots function.
Image source: Rotelle_GIC_Final2_tcr_clumping_logos.png

tcr_db_match


This table stores significant matches between TCRs in adata and TCRs in the file /scratch.global/ben_testing/conga/conga/data/new_paired_tcr_db_for_matching_nr.tsv

P values of matches are assigned by turning the raw TCRdist score into a P value based on a model of the V(D)J rearrangement process, so matches between TCRs that are very far from germline (for example) are assigned a higher significance.

Columns:

tcrdist: TCRdist distance between the two TCRs (adata query and db hit)

pvalue_adj: raw P value of the match * num query TCRs * num db TCRs

fdr_value: Benjamini-Hochberg FDR value for match

clone_index: index within adata of the query TCR clonotype

db_index: index of the hit in the database being matched

va,ja,cdr3a,vb,jb,cdr3b

db_XXX: where XXX is a field in the literature database



tcr_graph_vs_gex_features


This table has results from a graph-vs-features analysis in which we look for genes that are differentially expressed (elevated) in specific neighborhoods of the TCR neighbor graph. Differential expression is assessed by a ttest first, for speed, and then by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value passes an initial threshold (default is 10* the pvalue threshold).

Each row of the table represents a single significant association, in other words a neighborhood (defined by the central clonotype index) and a gene.

The columns are as follows:

ttest_pvalue_adj= ttest_pvalue * number of comparisons mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons log2enr = log2 fold change of gene in neighborhood (will be positive) gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores num_fg= the number of clonotypes in the neighborhood (including center) mean_fg= the mean value of the feature in the neighborhood mean_bg= the mean value of the feature outside the neighborhood feature= the name of the gene mait_fraction= the fraction of the skewed clonotypes that have an invariant TCR clone_index= the index in the anndata dataset of the clonotype that is the center of the neighborhood.


ttest_pvalue_adj mwu_pvalue_adj log2enr gex_cluster tcr_cluster feature mean_fg mean_bg num_fg clone_index mait_fraction nbr_frac graph_type feature_type
3.637877e-13 1.259886e-78 8.876144 3 7 ENSMMUG00000054409 2.968077 0.038523 33 -1 0.0 0.00 tcr_cluster gex
4.330276e-20 2.475967e-37 6.343509 2 8 ENSMMUG00000061119 2.803675 0.174741 30 -1 0.0 0.00 tcr_cluster gex
6.740658e-03 4.197116e-35 6.343996 2 7 ENSMMUG00000054409 1.696959 0.053418 54 399 0.0 0.10 tcr_nbr gex
3.616694e-02 2.214514e-31 5.810826 2 7 ENSMMUG00000054409 1.581101 0.066507 54 404 0.0 0.10 tcr_nbr gex
5.524648e-02 3.436715e-29 5.899136 3 7 ENSMMUG00000054409 1.601228 0.064233 54 402 0.0 0.10 tcr_nbr gex
1.564020e-01 1.070536e-28 5.546289 2 7 ENSMMUG00000054409 1.518759 0.073550 54 419 0.0 0.10 tcr_nbr gex
1.117727e-01 1.560612e-28 5.569513 2 7 ENSMMUG00000054409 1.524350 0.072918 54 421 0.0 0.10 tcr_nbr gex
1.072842e-01 1.860159e-28 5.571641 2 7 ENSMMUG00000054409 1.524861 0.072860 54 422 0.0 0.10 tcr_nbr gex
2.804220e-01 4.843464e-26 5.457238 2 7 ENSMMUG00000054409 1.497120 0.075994 54 417 0.0 0.10 tcr_nbr gex
1.775188e-01 1.238556e-25 5.462943 2 7 ENSMMUG00000054409 1.498515 0.075837 54 407 0.0 0.10 tcr_nbr gex
8.846428e-01 4.604450e-23 5.112633 2 7 ENSMMUG00000054409 1.410601 0.085768 54 408 0.0 0.10 tcr_nbr gex
9.328553e-01 1.026424e-22 5.057032 3 7 ENSMMUG00000054409 1.396264 0.087388 54 418 0.0 0.10 tcr_nbr gex
1.309988e+00 5.989549e-21 5.132640 2 7 ENSMMUG00000054409 1.415736 0.085188 54 406 0.0 0.10 tcr_nbr gex
1.481972e+00 1.612634e-20 5.029298 3 7 ENSMMUG00000054409 1.389076 0.088200 54 423 0.0 0.10 tcr_nbr gex
2.931209e-04 3.823810e-19 2.501481 0 2 ENSMMUG00000061119 0.951813 0.247530 57 -1 0.0 0.00 tcr_cluster gex
3.435755e+00 2.149080e-18 4.922413 2 7 ENSMMUG00000054409 1.361163 0.091353 54 414 0.0 0.10 tcr_nbr gex
3.285024e+00 2.149080e-18 4.917222 2 7 ENSMMUG00000054409 1.359799 0.091507 54 403 0.0 0.10 tcr_nbr gex
3.335224e+00 2.288763e-18 4.909072 2 7 ENSMMUG00000054409 1.357656 0.091750 54 405 0.0 0.10 tcr_nbr gex
3.389983e+00 3.069146e-18 4.880029 2 7 ENSMMUG00000054409 1.350005 0.092614 54 397 0.0 0.10 tcr_nbr gex
3.860440e+00 4.199074e-18 4.827127 3 7 ENSMMUG00000054409 1.336010 0.094195 54 398 0.0 0.10 tcr_nbr gex
3.572116e+00 4.859018e-18 4.833613 2 7 ENSMMUG00000054409 1.337729 0.094001 54 427 0.0 0.10 tcr_nbr gex
3.933236e+00 5.621568e-18 4.796497 3 7 ENSMMUG00000054409 1.327874 0.095114 54 425 0.0 0.10 tcr_nbr gex
5.550704e+00 1.755062e-17 4.551553 3 7 ENSMMUG00000054409 1.262026 0.102553 54 411 0.0 0.10 tcr_nbr gex
2.649942e-09 8.465841e-17 4.280124 2 7 ENSMMUG00000056515 3.054797 0.713226 54 313 0.0 0.10 tcr_nbr gex
1.089627e-08 8.958174e-17 4.308333 2 4 ENSMMUG00000056515 3.070184 0.711488 54 212 0.0 0.10 tcr_nbr gex
3.592379e-02 2.390806e-16 2.487117 0 2 ENSMMUG00000061119 0.956520 0.251418 54 187 0.0 0.10 tcr_nbr gex
7.090872e+00 2.836648e-16 4.820641 3 7 ENSMMUG00000054409 1.334289 0.094389 54 410 0.0 0.10 tcr_nbr gex
1.893355e-09 3.523686e-16 4.323484 2 7 ENSMMUG00000056515 3.078450 0.710554 54 109 0.0 0.10 tcr_nbr gex
8.125980e-02 5.935875e-16 2.423377 0 2 ENSMMUG00000061119 0.935882 0.253750 54 177 0.0 0.10 tcr_nbr gex
1.043029e-01 1.005052e-15 2.668576 1 2 ENSMMUG00000061119 1.016149 0.244682 54 200 0.0 0.10 tcr_nbr gex
8.291229e-08 6.401231e-15 4.098499 2 7 ENSMMUG00000056515 2.955863 0.724403 54 412 0.0 0.10 tcr_nbr gex
5.006807e-08 1.275900e-14 4.169137 2 7 ENSMMUG00000056515 2.994312 0.720059 54 277 0.0 0.10 tcr_nbr gex
1.643447e-07 1.420431e-14 4.115502 2 4 ENSMMUG00000056515 2.965114 0.723357 54 222 0.0 0.10 tcr_nbr gex
5.138232e-01 4.277048e-14 2.573013 0 2 ENSMMUG00000061119 0.984590 0.248247 54 162 0.0 0.10 tcr_nbr gex
1.149504e+00 1.472213e-13 2.417194 0 2 ENSMMUG00000061119 0.933889 0.253975 54 179 0.0 0.10 tcr_nbr gex
5.304837e-01 1.675102e-13 2.355310 0 2 ENSMMUG00000061119 0.914030 0.256218 54 160 0.0 0.10 tcr_nbr gex
4.001114e-07 2.380009e-13 4.056406 2 4 ENSMMUG00000056515 2.932970 0.726989 54 55 0.0 0.10 tcr_nbr gex
7.951065e-01 4.733059e-13 2.244093 0 2 ENSMMUG00000061119 0.878763 0.260203 54 196 0.0 0.10 tcr_nbr gex
4.891390e-05 1.930714e-12 3.767220 2 4 ENSMMUG00000056515 2.776131 0.744707 54 320 0.0 0.10 tcr_nbr gex
7.886987e+00 2.056620e-12 7.482114 3 7 ENSMMUG00000054409 3.600625 0.181684 6 399 0.0 0.01 tcr_nbr gex
8.158782e+00 2.166197e-12 7.472014 3 7 ENSMMUG00000054409 3.594243 0.181757 6 397 0.0 0.01 tcr_nbr gex
6.733624e+00 2.665200e-12 7.419899 3 7 ENSMMUG00000054409 3.561323 0.182133 6 412 0.0 0.01 tcr_nbr gex
4.574165e+00 2.955696e-12 7.486300 3 7 ENSMMUG00000054409 3.603271 0.181654 6 402 0.0 0.01 tcr_nbr gex
9.128151e+00 3.277424e-12 7.391793 3 7 ENSMMUG00000054409 3.543579 0.182335 6 415 0.0 0.01 tcr_nbr gex
1.203081e+00 3.527318e-12 2.308465 1 2 ENSMMUG00000061119 0.899108 0.257904 54 184 0.0 0.10 tcr_nbr gex
2.526534e+00 3.848063e-12 2.324406 0 2 ENSMMUG00000061119 0.904175 0.257332 54 193 0.0 0.10 tcr_nbr gex
4.742955e-06 4.282523e-12 3.853640 0 7 ENSMMUG00000056515 2.822914 0.739422 54 413 0.0 0.10 tcr_nbr gex
5.670211e-05 6.391235e-12 3.708465 2 7 ENSMMUG00000056515 2.744373 0.748295 54 126 0.0 0.10 tcr_nbr gex
2.339934e-05 6.922410e-12 3.757844 2 7 ENSMMUG00000056515 2.771061 0.745280 54 28 0.0 0.10 tcr_nbr gex
1.752553e-05 8.879416e-12 3.852103 2 4 ENSMMUG00000056515 2.822081 0.739516 54 415 0.0 0.10 tcr_nbr gex
Omitted 82 lines

tcr_graph_vs_gex_features_plot


This plot summarizes the results of a graph versus features analysis by labeling the clonotypes at the center of each biased neighborhood with the name of the feature biased in that neighborhood. The feature names are drawn in colored boxes whose color is determined by the strength and direction of the feature score bias (from bright red for features that are strongly elevated to bright blue for features that are strongly decreased in the corresponding neighborhoods, relative to the rest of the dataset).

At most one feature (the top scoring) is shown for each clonotype (ie, neighborhood). The UMAP xy coordinates for this plot are stored in adata.obsm['X_tcr_2d']. The score used for ranking correlations is 'mwu_pvalue_adj'. The threshold score for displaying a feature is 1.0. The feature column is 'feature'. Since we also run graph-vs-features using "neighbor" graphs that are defined by clusters, ie where each clonotype is connected to all the other clonotypes in the same cluster, some biased features may be associated with a cluster rather than a specific clonotype. Those features are labeled with a '*' at the end and shown near the centroid of the clonotypes belonging to that cluster.
Image source: Rotelle_GIC_Final2_tcr_graph_vs_gex_features_plot.png

tcr_graph_vs_gex_features_panels


Graph-versus-feature analysis was used to identify a set of GEX features that showed biased distributions in TCR neighborhoods. This plot shows the distribution of the top-scoring GEX features on the TCR UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons). At most 3 features from clonotype neighbhorhoods in each (GEX,TCR) cluster pair are shown. The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel. Points are plotted in order of increasing feature score.
Image source: Rotelle_GIC_Final2_tcr_graph_vs_gex_features_panels.png

tcr_genes_vs_gex_features


This table has results from a graph-vs-features analysis in which we look for genes that are differentially expressed (elevated) in specific neighborhoods of the TCR neighbor graph. Differential expression is assessed by a ttest first, for speed, and then by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value passes an initial threshold (default is 10* the pvalue threshold).

Each row of the table represents a single significant association, in other words a neighborhood (defined by the central clonotype index) and a gene.

The columns are as follows:

ttest_pvalue_adj= ttest_pvalue * number of comparisons mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons log2enr = log2 fold change of gene in neighborhood (will be positive) gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores num_fg= the number of clonotypes in the neighborhood (including center) mean_fg= the mean value of the feature in the neighborhood mean_bg= the mean value of the feature outside the neighborhood feature= the name of the gene mait_fraction= the fraction of the skewed clonotypes that have an invariant TCR clone_index= the index in the anndata dataset of the clonotype that is the center of the neighborhood.

In this analysis the TCR graph is defined by connecting all clonotypes that have the same VA/JA/VB/JB-gene segment (it's run four times, once with each gene segment type)
ttest_pvalue_adj mwu_pvalue_adj log2enr gex_cluster tcr_cluster feature mean_fg mean_bg num_fg clone_index mait_fraction gene_segment graph_type feature_type
7.484724e-15 8.969480e-92 11.094260 3 5 ENSMMUG00000049767 3.527550 0.015000 18 -1 0.0 TRBV5-8 tcr_genes gex
1.055481e-18 5.106906e-84 9.171846 2 7 ENSMMUG00000054409 3.159566 0.038370 31 -1 0.0 TRAV6 tcr_genes gex
8.999149e-17 1.240647e-79 10.937514 0 1 ENSMMUG00000063185 3.774666 0.021480 24 -1 0.0 TRBV4-2 tcr_genes gex
4.632209e-13 1.034259e-64 9.338192 7 3 ENSMMUG00000062211 4.019811 0.081115 21 -1 0.0 TRBV12-2 tcr_genes gex
2.928629e-01 2.430952e-63 8.472871 6 1 ENSMMUG00000056910 2.271786 0.024183 11 -1 0.0 TRAV16 tcr_genes gex
6.316045e-13 2.839198e-51 8.579260 6 1 ENSMMUG00000060662 3.342130 0.068893 15 -1 0.0 TRAV8-7 tcr_genes gex
1.107452e-12 9.851018e-46 8.352445 0 4 ENSMMUG00000062085 3.413647 0.086066 21 -1 0.2 TRBV4-3 tcr_genes gex
1.855973e-02 1.242521e-39 9.308708 2 4 ENSMMUG00000059325 2.991692 0.029397 8 -1 0.0 TRAV25 tcr_genes gex
8.302680e-06 1.467056e-37 6.595270 0 1 ENSMMUG00000057062 1.951733 0.060604 15 -1 0.0 TRAV8-3 tcr_genes gex
1.226911e-19 7.015239e-37 6.343509 2 8 ENSMMUG00000061119 2.803675 0.174741 30 -1 0.0 TRAV18 tcr_genes gex
2.178532e-106 1.600235e-36 5.592459 2 7 ENSMMUG00000056515 3.728990 0.610970 58 -1 0.0 TRBV6-3 tcr_genes gex
1.344410e-10 1.066049e-35 7.692902 7 0 ENSMMUG00000043894 3.622388 0.162158 19 -1 0.0 TRBV20-1 tcr_genes gex
3.968732e-01 3.184652e-34 5.313467 6 1 ENSMMUG00000061081 1.256408 0.061273 19 -1 0.0 TRAV8-2 tcr_genes gex
6.087152e+00 5.246990e-34 8.162003 0 3 ENSMMUG00000061255 2.457652 0.036600 7 -1 0.0 TRBV5-6 tcr_genes gex
3.710291e-10 5.752428e-29 7.097247 2 2 ENSMMUG00000051385 3.573801 0.225597 23 -1 0.0 TRBV7-6 tcr_genes gex
9.155726e-05 2.140831e-26 7.509862 6 0 ENSMMUG00000062974 2.936639 0.093444 12 -1 0.0 TRAV13-2 tcr_genes gex
7.397531e-03 3.309419e-23 7.249660 2 8 ENSMMUG00000062211 3.442710 0.181418 9 -1 0.0 TRBV12-3 tcr_genes gex
3.152396e-05 3.535373e-22 2.727790 0 2 ENSMMUG00000061119 1.043334 0.244951 52 -1 0.0 TRAV19 tcr_genes gex
1.739503e-02 1.019891e-21 7.777559 2 4 ENSMMUG00000065017 3.209041 0.102794 9 -1 0.0 TRAV12-1 tcr_genes gex
1.221002e-34 8.738056e-17 5.363096 2 4 ENSMMUG00000056515 3.915709 0.786193 28 -1 0.0 TRBV6-2 tcr_genes gex
2.784245e-02 6.241326e-12 6.818597 2 1 ENSMMUG00000043894 3.442513 0.237543 8 -1 0.0 TRBV21-1 tcr_genes gex
7.080497e-03 9.877314e-07 6.670205 2 10 ENSMMUG00000051385 3.695223 0.326019 7 -1 0.0 TRBV7-4 tcr_genes gex
1.119380e-01 7.839223e-05 4.851182 4 10 ENSMMUG00000051385 2.545002 0.341355 7 -1 0.0 TRBV5-6 tcr_genes gex
3.726081e-06 7.382818e-04 4.167548 2 1 ENSMMUG00000056515 3.269982 0.878995 16 -1 0.0 TRBV10-2 tcr_genes gex
1.901853e-08 1.002423e-03 2.294030 4 5 ENSMMUG00000056515 2.074884 0.883734 30 -1 0.0 TRBV9 tcr_genes gex
5.107602e+00 1.005481e-03 6.362424 0 5 ENSMMUG00000043894 3.249662 0.263284 4 -1 0.0 TRBV19 tcr_genes gex
8.124517e-01 4.614008e-02 1.227987 5 4 WASHC3 0.765917 0.399690 64 -1 0.0 TRBJ2-4 tcr_genes gex
4.676707e+00 7.089933e-02 2.693193 7 9 MAP2K3 1.100824 0.270232 9 -1 0.0 TRBV6-1 tcr_genes gex
3.887516e-77 4.397661e-01 7.171084 3 2 ENSMMUG00000051385 4.114374 0.349118 3 -1 0.0 TRBV7-7 tcr_genes gex
4.484605e-01 2.270749e+00 0.888811 0 0 SLA 1.824650 1.337273 56 -1 0.0 TRBJ2-3 tcr_genes gex
7.139158e-01 3.073687e+00 3.521512 7 1 TMEM135 0.964564 0.132244 3 -1 0.0 TRAV8-4 tcr_genes gex
8.406671e-01 8.569616e+00 3.399464 7 1 CSTF3 0.964564 0.143120 3 -1 0.0 TRAV8-4 tcr_genes gex

tcr_genes_vs_gex_features_panels


Graph-versus-feature analysis was used to identify a set of GEX features that showed biased distributions in TCR neighborhoods. This plot shows the distribution of the top-scoring GEX features on the TCR UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons). At most 3 features from clonotype neighbhorhoods in each (GEX,TCR) cluster pair are shown. The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel. Points are plotted in order of increasing feature score.
Image source: Rotelle_GIC_Final2_tcr_genes_vs_gex_features_panels.png

gex_graph_vs_tcr_features


This table has results from a graph-vs-features analysis in which we look at the distribution of a set of TCR-defined features over the GEX neighbor graph. We look for neighborhoods in the graph that have biased score distributions, as assessed by a ttest first, for speed, and then by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value passes an initial threshold (default is 10* the pvalue threshold).

Each row of the table represents a single significant association, in other words a neighborhood (defined by the central clonotype index) and a tcr feature.

The columns are as follows:

ttest_pvalue_adj= ttest_pvalue * number of comparisons ttest_stat= ttest statistic (sign indicates where feature is up or down) mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores num_fg= the number of clonotypes in the neighborhood (including center) mean_fg= the mean value of the feature in the neighborhood mean_bg= the mean value of the feature outside the neighborhood feature= the name of the TCR score mait_fraction= the fraction of the skewed clonotypes that have an invariant TCR clone_index= the index in the anndata dataset of the clonotype that is the center of the neighborhood.


ttest_pvalue_adj ttest_stat mwu_pvalue_adj gex_cluster tcr_cluster num_fg mean_fg mean_bg feature mait_fraction clone_index nbr_frac graph_type feature_type
3.636894e-27 2.435638e+06 1.562678e-112 4 9 6 1.000000 0.000000 mait 1.000000 12 0.01 gex_nbr tcr
0.000000e+00 1.845495e+02 9.519093e-46 4 9 6 1.000000 0.015209 TRAV1-2 1.000000 12 0.01 gex_nbr tcr
3.867305e-01 -4.951298e+00 2.348437e-01 2 0 54 0.506821 0.795354 af4 0.000000 26 0.10 gex_nbr tcr
9.933085e-02 -4.471693e+00 4.333296e-01 7 0 29 -0.126722 0.222531 cd8 0.000000 -1 0.00 gex_cluster tcr
2.098956e-01 -5.082872e+00 7.249505e-01 2 0 54 0.523511 0.793468 af4 0.153846 12 0.10 gex_nbr tcr
1.007617e+00 3.463887e+00 9.733440e-01 4 9 60 -5.399413 -5.587273 mjenergy 0.266667 -1 0.00 gex_cluster tcr
4.460173e-01 3.724043e+00 9.859175e-01 4 5 60 0.720707 0.700891 nndists_tcr 0.066667 -1 0.00 gex_cluster tcr
4.178701e-02 -4.250926e+00 3.201596e+00 3 6 78 0.064103 0.207048 TRBJ2-1 0.000000 -1 0.00 gex_cluster tcr
1.389785e-11 -7.971102e+00 5.620059e+00 4 6 60 0.000000 0.118644 TRBJ2-3 0.000000 -1 0.00 gex_cluster tcr

gex_graph_vs_tcr_features_plot


This plot summarizes the results of a graph versus features analysis by labeling the clonotypes at the center of each biased neighborhood with the name of the feature biased in that neighborhood. The feature names are drawn in colored boxes whose color is determined by the strength and direction of the feature score bias (from bright red for features that are strongly elevated to bright blue for features that are strongly decreased in the corresponding neighborhoods, relative to the rest of the dataset).

At most one feature (the top scoring) is shown for each clonotype (ie, neighborhood). The UMAP xy coordinates for this plot are stored in adata.obsm['X_gex_2d']. The score used for ranking correlations is 'mwu_pvalue_adj'. The threshold score for displaying a feature is 1.0. The feature column is 'feature'. Since we also run graph-vs-features using "neighbor" graphs that are defined by clusters, ie where each clonotype is connected to all the other clonotypes in the same cluster, some biased features may be associated with a cluster rather than a specific clonotype. Those features are labeled with a '*' at the end and shown near the centroid of the clonotypes belonging to that cluster.
Image source: Rotelle_GIC_Final2_gex_graph_vs_tcr_features_plot.png

gex_graph_vs_tcr_features_panels


Graph-versus-feature analysis was used to identify a set of TCR features that showed biased distributions in GEX neighborhoods. This plot shows the distribution of the top-scoring TCR features on the GEX UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons). At most 3 features from clonotype neighbhorhoods in each (GEX,TCR) cluster pair are shown. The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel. Points are plotted in order of increasing feature score.
Image source: Rotelle_GIC_Final2_gex_graph_vs_tcr_features_panels.png

graph_vs_features_gex_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the GEX landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_gex' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are GEX clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=53 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie GEX features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the GEX features).


Image source: Rotelle_GIC_Final2_graph_vs_features_gex_clustermap.png

graph_vs_features_tcr_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the TCR landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_tcr' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are TCR clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=53 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie TCR features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the TCR features).


Image source: Rotelle_GIC_Final2_graph_vs_features_tcr_clustermap.png

graph_vs_summary


Summary figure for the graph-vs-graph and graph-vs-features analyses.
Image source: Rotelle_GIC_Final2_graph_vs_summary.png

gex_clusters_tcrdist_trees


These are TCRdist hierarchical clustering trees for the GEX clusters (cluster assignments stored in adata.obs['clusters_gex']). The trees are colored by CoNGA score with a color score range of 5.32e+00 (blue) to 5.32e-09 (red). For coloring, CoNGA scores are log-transformed, negated, and square-rooted (with an offset in there, too, roughly speaking).
Image source: Rotelle_GIC_Final2_gex_clusters_tcrdist_trees.png

conga_threshold_tcrdist_tree


This is a TCRdist hierarchical clustering tree for the clonotypes with CoNGA score less than 10.0. The tree is colored by CoNGA score with a color score range of 1.00e+01 (blue) to 1.00e-08 (red). For coloring, CoNGA scores are log-transformed, negated, and square-rooted (with an offset in there, too, roughly speaking).
Image source: Rotelle_GIC_Final2_conga_threshold_tcrdist_tree.png

hotspot_features


Find GEX (TCR) features that show a biased distribution across the TCR (GEX) neighbor graph, using a simplified version of the Hotspot method from the Yosef lab.

DeTomaso, D., & Yosef, N. (2021). "Hotspot identifies informative gene modules across modalities of single-cell genomics." Cell Systems, 12(5), 446–456.e9.

PMID:33951459

Columns:

Z: HotSpot Z statistic

pvalue_adj: Raw P value times the number of tests (crude Bonferroni correction)

nbr_frac: The K NN nbr fraction used for the neighbor graph construction (nbr_frac = 0.1 means K=0.1*num_clonotypes neighbors)


Z pvalue_adj feature feature_type nbr_frac
49.336595 0.000000e+00 ENSMMUG00000056515 gex 0.10
40.232142 0.000000e+00 ENSMMUG00000054409 gex 0.10
37.780389 0.000000e+00 ENSMMUG00000061119 gex 0.10
30.231293 6.114312e-197 ENSMMUG00000054409 gex 0.01
23.005884 2.707485e-113 ENSMMUG00000061119 gex 0.01
20.949730 1.255227e-93 ENSMMUG00000056515 gex 0.01
19.998144 3.802338e-85 ENSMMUG00000062211 gex 0.10
19.768175 4.393179e-85 mait tcr 0.01
18.625673 1.323592e-73 ENSMMUG00000063185 gex 0.10
16.644333 2.212818e-58 ENSMMUG00000051385 gex 0.10
16.490307 2.865384e-57 ENSMMUG00000049767 gex 0.10
16.440923 6.480665e-57 ENSMMUG00000043894 gex 0.10
15.019460 3.642073e-47 RORC gex 0.01
14.974161 7.205824e-47 ENSMMUG00000062085 gex 0.10
13.580694 3.467705e-38 ENSMMUG00000063185 gex 0.01
12.154972 3.590778e-30 ENSMMUG00000062211 gex 0.01
12.041434 1.431390e-29 ENSMMUG00000043894 gex 0.01
11.386613 3.243373e-26 ENSMMUG00000057062 gex 0.10
11.078364 1.062752e-24 ENSMMUG00000059325 gex 0.10
11.061355 1.284877e-24 ENSMMUG00000060662 gex 0.10
10.571318 2.692498e-22 BLK gex 0.01
10.533641 4.021138e-22 ENSMMUG00000057062 gex 0.01
10.423756 1.284975e-21 ENSMMUG00000051385 gex 0.01
9.397822 3.705435e-17 ENSMMUG00000059234 gex 0.01
8.959303 2.173455e-15 ENSMMUG00000062085 gex 0.01
8.951750 2.327481e-15 ENSMMUG00000060662 gex 0.01
8.771732 1.170346e-14 KLRB1 gex 0.01
8.690750 2.395083e-14 ENSMMUG00000065017 gex 0.01
8.649237 3.448732e-14 ENSMMUG00000061255 gex 0.10
8.565631 7.149958e-14 TYROBP gex 0.01
8.352788 4.434904e-13 ENSMMUG00000049767 gex 0.01
7.703842 1.036472e-12 TRAV1-2 tcr 0.01
7.995739 8.567680e-12 ENSMMUG00000061255 gex 0.01
7.391101 9.686542e-10 ENSMMUG00000061081 gex 0.10
7.316706 1.690483e-09 ENSMMUG00000052673 gex 0.10
7.273446 2.331063e-09 ENSMMUG00000056431 gex 0.10
7.221398 3.422829e-09 ENSMMUG00000061499 gex 0.01
7.077887 9.736488e-09 STAP1 gex 0.01
6.457199 7.094916e-07 ENSMMUG00000016687 gex 0.01
5.719503 8.386610e-07 mait tcr 0.10
6.393141 1.081032e-06 KNTC1 gex 0.01
6.348860 1.442928e-06 ENSMMUG00000065017 gex 0.10
5.406902 5.033755e-06 TRAV12-1 tcr 0.01
6.007525 1.253080e-05 ENSMMUG00000062974 gex 0.10
5.910594 2.267504e-05 ENSMMUG00000059325 gex 0.01
5.892373 2.532342e-05 C1H1orf53 gex 0.01
5.764445 5.449768e-05 ENSMMUG00000049338 gex 0.01
5.762043 5.527917e-05 ENSMMUG00000056431 gex 0.01
5.467192 3.041440e-04 ADA2 gex 0.01
5.392620 4.618999e-04 ENSMMUG00000056910 gex 0.10
Omitted 15 lines

hotspot_gex_umap


HotSpot analysis (Nir Yosef lab, PMID: 33951459) was used to identify a set of GEX (TCR) features that showed biased distributions in TCR (GEX) space. This plot shows the distribution of the top-scoring HotSpot features on the GEX UMAP 2D landscape. The features are ranked by adjusted P value (raw P value * number of comparisons). The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel.

Features are filtered based on correlation coefficient to reduce redundancy: if a feature has a correlation of >= 0.9 (the max_feature_correlation argument to conga.plotting.plot_hotspot_umap) to a previously plotted feature, that feature is skipped. Points are plotted in order of increasing feature score
Image source: Rotelle_GIC_Final2_hotspot_combo_features_0.100_nbrs_gex_plot_umap_nbr_avg.png

hotspot_gex_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the GEX landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_gex' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are GEX clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=53 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie GEX features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the GEX features).


Image source: Rotelle_GIC_Final2_hotspot_combo_features_0.100_nbrs_gex_plot_clustermap_nbr_avg.png

hotspot_tcr_umap


HotSpot analysis (Nir Yosef lab, PMID: 33951459) was used to identify a set of GEX (TCR) features that showed biased distributions in TCR (GEX) space. This plot shows the distribution of the top-scoring HotSpot features on the TCR UMAP 2D landscape. The features are ranked by adjusted P value (raw P value * number of comparisons). The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel.

Features are filtered based on correlation coefficient to reduce redundancy: if a feature has a correlation of >= 0.9 (the max_feature_correlation argument to conga.plotting.plot_hotspot_umap) to a previously plotted feature, that feature is skipped. Points are plotted in order of increasing feature score
Image source: Rotelle_GIC_Final2_hotspot_combo_features_0.100_nbrs_tcr_plot_umap_nbr_avg.png

hotspot_tcr_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the TCR landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_tcr' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are TCR clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=53 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie TCR features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the TCR features).


Image source: Rotelle_GIC_Final2_hotspot_combo_features_0.100_nbrs_tcr_plot_clustermap_nbr_avg.png